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Abstract 

Respondent-driven sampling (RDS) is a procedure to sample from hard-to-reach 
populations. It has been widely used in several countries, especially in the moni- 
toring of HIV/AIDS and other sexually transmitted infections. Hard-to-reach pop- 
ulations have had a key role in the dynamics of such epidemics and must inform 
evidence-based initiatives aiming to curb their spread. In this paper, we present 
a simple test for network dependence for a binary response variable. We estimate 
the prevalence of the response variable. We also propose a binary regression model 
taking into account the RDS structure which is included in the model through a 
latent random effect with a correlation structure. The proposed model is illustrated 
in a RDS study for HIV and Syphilis in men who have sex with men implemented 
in Campinas (Brazil). 
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1 Introduction 

Respondent-driven Sampling, RDS, was originally formulated by Heckathorn [l] as com- 
prising a first-order Markovian process, supposed to reach equilibrium after a given num- 
ber of waves (originally estimated as six). It is a sampling scheme used to access hard- 
to-reach populations, e.g. heavy drug users |2]. RDS has been widely used in several 
countries and well-known public health institutes [s]. 

More recent developments understand RDS as a Markov Chain Monte Carlo, formally 
defined by |4| and later used by the authors in a comprehensive series of simulations [5j. 

An alternative perspective, to be fully developed yet, understands the networks ob- 
tained by RDS as a branching process that violates the basic assumptions of Markovian 
processes [6] and propose the use of stochastic context-free grammars to analyze databases 
generated by RDS. 

Whatever the alternative to be taken in the analysis of RDS-based data, there is 
nowadays a consensus that RDS constitutes a powerful strategy to assess hard-to-reach 
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population, such as crack users, for whom RDS generates samples which are substantially 
different from those based on institutional random samples [7j. In the same way, some 
biases traditionally associated with chain-referral samples, such as those secondary to 
the role of the so-called "super-recruiters", whose very existence it is blocked by the 
establishment of a priori recruiting quotas in the context of RDS studies js] . 

In order to learn about the prevalence of specific characteristic of the population dif- 
ferent estimators have been developed for RDS [H[9}jl2]. However, there is little research 
on estimating risk factors for hard-to-reach population taking into account the RDS ap- 
proach. In this paper, we propose a strategy for carrying out regression analysis for RDS 
data. 



2 Methods 

2.1 Respondent-driven sampling 



Heckathorn yj proposed the use of a snowball sampling method 13 to sample from 
hard-to-reach populations. The proposed sampling scheme is called Respondent driven 
sampling, or simply RDS. In a snowball sampling, the data are collected according to 
a chain-link recruitment process where few participants, called seeds, are chosen from 
the population of study, these participants are asked to recruit future participants of the 
same population group, which will be asked to recruit future participants of the same 
population, and so on. This process forms a network of recruits. In a respondent-driven 
sampling, the participants also provide information about their personal network size 
and each individual has a unique number or code allowing us to connect recruiters and 
participants. 

In general, we are interested in some quantities (or variables) associated with each 
participant of the sample. These quantities may be influenced by the interaction among 
the participants. We call this association as network dependence. In this sense, the 
variance within a recruiter-recruitee dyad (pair) tend to be different (more likely to be 
less pronounced) than the variance between two interviewees not connected by a given 
referral link. 114). 

If a quantity of interest is a categorical variable, then it is possible to built a con- 
tingency table with the recruiter values in the columns and the participant values in the 
rows. If there is some dependence between the status of the quantity for the recruiters and 
the status of the quantity for the participant, then it may suggest a network dependence. 
Therefore, a Pearson independence test for contingency tables can be used for checking 
evidence of network dependence. 
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RDS estimators 

Suppose we are interested in a characteristic A which can be observed or not in each 
individuals. The simplest estimator of the prevalence of the characteristic A, 6a, is the 
naive estimator given by 

= =f (1) 

where n is the sample size, ua = Y17=i ^^'^ is an indicator function where 

11^(0 = 1 if the individual i presents the characteristic A and Il^(i) = otherwise. If the 
sample size is big and independence among individuals is a reasonable assumption, then 
the 95% confidence interval is {6^^^ ±1.9Q{6^\l — 6^^^) /nY^'^) . However, the independence 
assumption may be a strong assumption for respondent-driven sampling data. 

An estimator that takes into account the network structure was propose d p[] , it is 



called RDS I. The RDS I estimator was improved by Volz and Heckathorn 11 . The 
estimator is called RDS II, and it is given by the following 

^(RDS II) Etl 

where 5j is called the degree, and it is defined by the number of 'friends from the same 
population that participant i declares to have. The authors provide an estimator for the 
variance of ([2]). The authors also provide a simulation study showing that the confidence 
intervals built using the RDS II estimator are better, in terms of average coverage prob- 
ability, than the confidence intervals built with the naive estimator. Recently, another 



estimator has been developed called RDS III, 12 . However, this estimator is not explored 
in this paper. 



2.2 Binary regression 

Let Yi be a variable representing a characteristic of interest of the i-th individual inter- 
viewed in a respondent-driven sample, where Yi = 1 if the characteristic of interest is 
observed on individual i, and Yi = otherwise for z = 1, 2, . . . , ra. 

Risk factors can be incorporated in a binary regression model as the following 

Yi ~ Bernoulli{6i), 
gi9,) = r7, = xf/3, t = l,2,...,n (3) 

where Xj is a vector of possible risk factors, /3 are the risk effects and g{-) is a link 
function. If the link function is the logit function, g{z) = logit{z) = \og{z/{l — z)), then 
the regression is called logistic regression. 

However, the model ^ is valid only when the characteristic of interest is independent 
among the individuals in a RDS study. This is only valid when there is no network 
dependence. 
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If the contact network is known, then a latent term can be included in the logistic 
model where the network structure will be taken into account. This is done using a latent 
Gaussian Markov random field, i.e. 

Yi ~ Bernoulli{6i), 
g{Oi) = Vi = ^I(3 + uJi, ^ = l,2,...,n (4) 

where Ui is a latent effect of the network structure. The latent effects are modeled using 
the following conditional auto-regressive model, CAR, proposed by ll5]. 



^jAr-N (^-^ c,, ^^^^ j , (5) 

Hi is the number of contacts of individual i (number of connections), i ^ j means the 
set of individuals connected to i, r is a precision hyper-parameter and d is a, diagonal 
parameter. In order to complete the model we set vague priors for /3, r and d. 

The model Q is a well-known model in Bayesian spatial statistics, where neighborhood 



regions are considered as connections 16 . The inference is based on the marginal posterior 
distributions of each parameter. These posterior marginal distributions are obtained using 
the integrated nested Laplace approximation, INLA, jl7j. Model comparison is done by 



using the deviance information criterion, DIG, 18 



3 Application: HIV and Syphilis of MSM population 
in Campinas, Brazil 

The RDS study carried out by de Mello et al. (2008) [To] was the first large RDS study 
implemented in Brazil in a single location (Gampinas, Sao Paulo state). This study was 
part of a comprehensive initiative launched by Horizons-USAID aiming to better assess 
the HIV/AIDS epidemic among gay men, worldwide, using new methods targeting hard- 
to-reach populations [20j . 

The study comprised 658 men who have sex with men (MSM) and was preceded by 
a comprehensive formative study. The inclusion criteria for a participant are (i) born 
male; (ii) had anal or oral sex with another man or transvestite in the past six months; 
(iii) 14 years of age or older; (iv) reside in Metropolitan area of Gampinas. Participants 
were compensated for enrolling in the study and for each eligible man they successfully 
recruited into the study. The maximum number of referrals was 3. Some recruitment 
waves were exceedingly long, comprising over 20 successive recruitment waves. In this 
sense, according to the RDS original formulations, equilibrium should be reached. Figure 
[T] presents the observed network in the Gampinas RDS study. The initial recruitment 
started with 10 seeds. Seven additional seeds were added 4-6 months after the study 
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started due to slow recruitment. Another six seeds were added (eight months after the 
study started). Additionally, seven potential participants who arrived after the 10th 
month at the study site without a coupon were treated as seeds. 




Figure 1. Recruitment pattern of men who had sex with other men. The larger circles 
represent seeds and smaller circles represent subsequent recruitees. 



Subsequent, larger, Brazilian studies, were conceived as multicity studies. As such, 
they deal with a pool of local networks instead of a single, larger network. The conti- 
nental size of Brazil, besides its pronounced geographic and social heterogeneity makes 
the analysis of such pooled data and respective weighting a formidable challenge. As 
shown by a former paper by our research group [2l], even considering a single city (Rio de 
Janeiro) belonging to this multicity study (which comprised 10 cities all over the country, 
as of 2009-2010), we made evident structural bottlenecks (secondary to structural violence 
affecting Rio de Janeiro's drug scenes, [22]) that hampered the very progress of the re- 
cruiting process. A posterior analysis of the geographic dimensions of another RDS-based 



study, carried out in Uganda's villages 23 , did not confirm our findings, much probably 
due to the striking social, geographic and demographic differences between Uganda's vil- 
lages and the violence-laden large metropolitan scenes where the Rio de Janeiro's study 
took place. 

Whatever the underlying reasons associated with these and other discrepancies, we 
chose here to profit from a one-site large study with gay men. Although homophobic 
crimes and other sexual identity and racial-driven crimes do unfortunately exist in Brazil 
(as described by the participants themselves; [19]), the gay scene in Campinas (as well in 
other major Brazilian urban areas) can be defined as an open scene, not affected by the 
same structural bottlenecks disrupting Rio de Janeiro drug scenes to the point of making 
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some of them impervious to different attempts of researchers and health professionals to 
work in partnership with local leaderships and native outreach workers. 

On a side note, one must observe that the first RDS (and to the best of our knowledge, 
so far, only one) simulation study on the accuracy of RDS I estimator was parametrized 



after the same data from de Mello's study 24 



3.1 Testing the network dependence and estimating prevalences 

Table [T] is a contingency table of the HIV serostatus for the participants and its re- 
cruiters. The Pearson independence test rejects the independence null hypothesis, 
p-value <0.0001, suggesting there is evidence of network dependence for HIV serostatus. 
Analogously, Table [2] is a contingency table of the syphilis serostatus for the participants 
and its recruiters. The Pearson independence test rejects the independence null hy- 
pothesis, p-value = 0.0014, suggesting there is evidence of network dependence for syphilis 
serostatus. Therefore, we have some evidence that we should include the network struc- 
ture to estimate the prevalence and to find risk factors associated with HIV and syphilis 
for the MSM population in Campinas. 

Table |3] provides the estimated prevalences using the naive estimator ([T]) and the 
RDS II estimator (|2]), and their correspondent 95% confidence interval. Since there is 
some evidence on network dependence for HIV and syphilis serostatus, the estimated 
prevalences that should be considered are those using the RDS estimator, i.e 7.1% (4.7; 
9.6) for HIV and 9.4% (1.5; 17.4) for syphilis. 



3.2 Regression analysis 

In order to obtain risk factor for HIV and syphilis serostatus, we use a binary regres- 
sion model with logistic link function. Hence, we have two different models: the usual 
logistic regression (|3]), LogReg, and the logistic regression with latent network effect (|4]), 
NetLogReg. 

The results for HIV serostatus are summarized in Table |4j The DIG suggests that 
the logistic regression with latent network effect is better, which agrees with the fact that 
there is some evidence of a network dependence in this data. Although we estimated 
the regression coefficients, the results are interpreted as odds ratios (OR). Participants 
that received any educational material in the past 12 months are three times the odds 
of having HIV (OR=3.03, 95%CI 1.21-8.42) compared to those who did not received 
any educational material. Participants older than 25 years are four times the odds of 
having HIV (0R=4.11, 95%CI 1.68-10.62) compared to those younger than 25 years. 
Participants with no more than high school degree had almost three times the odds 
of having HIV (OR=2.91, 95%CI 1.04-9.22) than those participants with at least college 
education. Participants who did not engage UIAI (unprotected insertive anal intercourse) 
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in the last two months had 2.7 times the odds of having HIV (OR=2.70, 95%CI 1.08-7.46) 
than those participants who engaged UIAI in the last two months. 

The results for Syphilis serostatus are summarized in Table |5} The DIG suggests 
that there is no significant difference between the two models. This seems to contradict 
the dependence test, however due to missingness, the sample size had to be reduced 658 
to 545 participants. This reduction on the sample size removed several social observed 
connections leading to a weaker network dependency. Therefore, for this sample, the usual 
logistic regression was chosen to described the factors related to syphilis serostatus. 

Participants that declared themselves transvestite had 2.49 times the odds of having 
syphilis (OR=2.49, 95%CI 1.03-5.77) compared to those who declared themselves men. 
Participants older than 25 years had 2.93 times the odds of having syphilis (OR=2.93, 
95%CI 1.54-5.73) compare to those younger than 25 years. Participants that live in 
Campinas city had 2.48 times the odds of having syphilis (OR=2.48, 95%CI 1.02-6.81) 
compared to those who live in other cities. Participants who consider themselves sex 
workers had four times the odds of having syphilis (OR=2.97, 95%CI 1.63-9.51) than 
those whose not consider themselves as sex worker. Participants who had any sexually 
transmitted infection (STI) symptom in the past year had three times the odds of having 
syphilis (OR=3.02, 95%CI 1.62-5.65) compared to those participants that did not have 
any symptom of STI in the past year. 

4 Discussion 

In this paper, we present a strategy to model binary response variables from respondent- 
driven sampling data. Firstly, the network dependence of the response variable should 
be tested, we propose to test the dependence by building a contingency table with the 
quantity of interest of the recruiters and the participants and run an independence test. 
If there is any evidence of network dependence of the response variable, we suggest using 
the RDS II estimator rather than the naive estimator to estimate the prevalence of the 
quantity of interest. The binary regression model with latent effects can be an alternative 
to regression models for RDS data that ignore the network structure. 

We observed that there is some evidence on network dependence for HIV and syphilis 
serostatus. Using the RDS II estimator the prevalences are 7.1% (4.7; 9.6) for HIV and 
9.4% (1.5; 17.4) for syphilis. 

There are some issues that still need to be addressed. The RDS II estimator relies 
on the sampling-with-replacement assumption, and the biases introduced due to sampling 
without-replacement are unknown. Volz and Heckathorn (2008) |11| discuss this and some 
other issues related to the use of the RDS II estimator on practice. Another important 
issues not tackled in this paper are the missing data. When there are some missing 
information the observations with missing were removed and the observed network was 
broken. Therefore, imputation methods for network data are needed. 
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The binary regression with the latent network effect assumes that the observed network 
contains all the information about the social network. However, the network observed 
from respondent-driven sampling data is incomplete. This is due to the limited number of 
friends each person can bring and also the fact that the each individual cannot participate 
more than once in the study. Hence, for future research, we intend to reconstruct the social 
network of the sample using the degree information and other explanatory variables, and 
given the estimated social network we could directly apply model Q. 

References 

1. Heckathorn DD. Respondent-driven sampling: A new approach to the study of 
hidden populations. Social Problems. 1997;44:174-199. 

2. Salganik MJ, Fazito D, Bertoni N, Abdo AH, Mello MB, Bastos FI. Assessing Net- 
work Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence From a 
Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil. American Journal 
of Epidemiology. 2011;174(10):1190-1196. 

3. Malekinejad M, Johnston LG, Kendall C, Kerr LR, Rifkin MR, Rutherford GW. Us- 
ing Respondent-Driven Sampling Methodology for HIV Biological and Behavioral 
Surveillance in International Settings: A Systematic Review. AIDS and Behavior. 
2008;12(1):S105-S130. 

4. Goel S, Salganik MJ. Respondent-driven sampling as Markov chain Monte Carlo. 
Statistics in Medicine. 2009;28:2202-2229. 

5. Goel S, Salganik MJ. Assessing respondent-driven sampling. Proceedings of the 
National Academy of Sciences of the United States of America. 2010;107(15):6743- 
6747. 

6. Poon AFY, Brouwer KG, Strathdee SA, Firestone-Gruz M, Lozada RM, Kosakovsky 
Pond SL, et al. Parsing Social Network Survey Data from Hidden Populations Using 
Stochastic Gontext-Free Grammars. PLoS ONE. 2009;4(9):e6777. 

7. Oteo Prez A, Benschop A, Korf DJ. Differential Profiles of Grack Users in 
Respondent-Driven and Institutional Samples: A Three-Site Comparison. Euro- 
pean Addiction Research. 2012;18:184-192. 

8. Tiffany J. Respondent-Driven Sampling in Participatory Research Contexts: 
Participant-Driven Recruitment. Journal of Urban Health. 2006;83:113-124. 

9. Heckathorn DD. Respondent-driven sampling II: Deriving valid population es- 
timates from chain-referral samples of hidden populations. Social Problems. 
2002;49(l):ll-34. 



9 



10. Salganik MJ, Heckathorn DD. Sampling and estimation in hidden populations 
using respondent-driven sampling. Sociological methodology. 2004;34:193-239. 

11. Volz E, Heckathorn DD. Probability Based Estimation Theory for Respondent 
Driven Samphng. Journal of Official Statistics. 2008;24(l):79-97. 

12. Gile KJ. Improved Inference for Respondent-Driven Sampling Data with Applica- 
tion to HIV Prevalence Estimation. ArXiv e-prints. 2010 Jun;. 

13. Goodman LA. Snowball Sampling. The Annals of Mathematical Statistics. 
1961;32(1):148-170. 

14. McPherson M, Smith-Lovin L, Cook JM. Birds of a Feather: Homophily in Social 
Networks. Annual Review of Sociology. 2001;27:415-444. 

15. Besag J. Spatial Interaction and the Statistical Analysis of Lattice Systems. Journal 
of the Royal Statistical Society: Series B (Statistical Methodology). 1974;36(2):192- 
236. 

16. Besag J, York J, MoUie A. Bayesian image restoration, with two applications in 
spatial statistics. Annals of the Institute of Statistical Mathematics. 1991;43:1-20. 
10.1007/BF00116466. 

17. Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian 
models by using integrated nested Laplace approximations. Journal of the Royal 
Statistical Society: Series B (Statistical Methodology). 2009;71:319-392. 

18. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of 
model complexity and fit. Journal of the Royal Statistical Society: Series B (Sta- 
tistical Methodology). 2002;64(4):583-639. 

19. de Mello M, Pinho AA, Chinaglia M, Tun W, Barbosa Junior A, Ilario MCFJ, ct al. 
Assessment of risk factors for HIV infection among men who have sex with men in 
the metropolitan area of Campinas city, Brazil, using respondent-driven sampling. 
Horizons Final Report. Washington, DC: Population Council; 2008. 

20. Geibel S, Tun W, Tapsoba P, Kellerman S. HIV vulnerability of men who have 
sex with men in developing countries: Horizons studies, 2001-2008. Public Health 
Reports. 2010;125(2):316-324. 

21. Toledo L, Codego CT, Bertoni N, Albuquerque E, Malta M, Bastos FI. Brazilian 
Multicity Study Group on Drug Misuse. Putting respondent-driven sampling on the 
map: insights from Rio de Janeiro, Brazil. Journal of Acquired Immune Deficiency 
Syndromes. 2011;57:S136-S143. 



10 



22. Bastos FI, Caiaffa W, Rossi D, Vila M, Malta M. The children of mama coca: 
Coca, cocaine and the fate of harm reduction in South America. The International 
journal on drug policy. 2007;18(2):99-106. 

23. McCreesh N, Johnston LG, Copas A, Sonnenberg P, Seeley J, Hayes RJ, et al. 
Evaluation of the role of location and distance in recruitment in respondent-driven 
sampling. International Journal of Health Geographies. 2011;10:56. 

24. Albuquerque E, Bastos FI, Codego C. Assessing Respondent-Driven Sampling in 
the estimation of STDs prevalence in populationsstructured in complex networks. 
In: Costa LF, Evuskoff A, Mangioni C, Mcnczcs R, editors. Complex Networks: 
Second International Workshop, CompleNet 2010, Rio de Janeiro, Brazil, October 
13-15, 2010, Revised Selected Papers, vol. 1. 1st ed. Springer; 2011. p. 108-118. 

Tables 

Table 1. HIV serostatus for the participants and its recruiters. 



Recruiter 
Participant Negative Positive 
Negative 478 27 

Positive 33 10 



Table 2. Syphihs serostatus for the participants and its recruiters. 





Recruiter 


Participant 


Negative Positive 


Negative 


481 49 


Positive 


70 19 
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Table 3. Estimated prevalence for HIV and syphilis and the correspondent 
95% confidence interval. 



Estimator 
Naive RDS II 

HIV 0.0789 (0.0577; 0.1001) 0.0711 (0.0466; 0.0955) 
Syphilis 0.1155 (0.0911; 0.1399) 0.0944 (0.0146 0.1741) 



Table 4. Estimated effects and 95% credible intervals for the logistic 
regression (LogReg) and logistic regression with network structure 
(NetLogReg) with HIV serostatus as response variable 



LogReg (95% CI) NetLogReg (95% CI) 

(Intercept) -4.2076 (-7.0755, -1.8333) -5.1342 (-8.4060, -2.3608) 
Educational material in the past two months? (Yes, No) 

No -0.8593 (-1.7351, -0.0616) -1.1091 (-2.1311, -0.1953) 
Gender identity (Male, Transvestite, Others) 
Transvestite 0.2809 (-0.7307, 1.2241) 0.0062 (-1.3697, 1.2326) 
Others 0.3668 (-0.7745, 1.3809) 0.5984 (-0.6983, 1.7596) 
Age category (j25, > 25) 

> 25 1.2033 (0.4629, 1.9826) 1.4142 (0.5200, 2.3626) 
Belongs to a gay NGO? (Yes, No) 

No -0.3553 (-1.3367, 0.7136) -0.9402 (-2.0846, 0.3129) 
Any physical violence ever against gays and trans? (yes. No) 

Yes 0.5123 (-0.2135, 1.2313) 0.7455 (-0.1077, 1.6023) 
Total number of partners in the past 2 months (0,1, >1) 

1 0.7446 (-1.2697, 3.2865) 0.8539 (-1.3817, 3.6636) 

>1 1.3142 (-0.5269, 3.7020) 1.3213 (-0.7057, 3.9303) 
Consider self as sex worker? (Yes, No) 

No -0.2374 (-1.2433, 0.8524) -0.1729 (-1.4574, 1.2522) 
Any symptoms of STI in the past year? (Yes, No) 

No -0.5148 (-1.2260, 0.2141) -0.3671 (-1.2284, 0.5270) 
At least college degree? (Yes, No) 

No 0.7460 (-0.1148, 1.6842) 1.0666 (0.0346, 2.2212) 
UIAI in the last 2 months? (Yes, No) 

Yes -0.9568 (-1.8539, -0.1426) -0.9926 (-2.0101, -0.0802) 

Die 268.20 251.97 

Pd 12.12 44.85 
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Table 5. Estimated effects and 95% credible intervals for the logistic 
regression (LogReg) and logistic regression with network structure 
(NetLogReg) with Syphilis serostatus as response veiriable. 



LogReg (95% CI) 



NetLogReg (95% CI) 



(Intercept) -1.1234 (-2.6638, 0.3757) -1.1232 



Educational material in the past two months? (Yes, No) 



No -0.2420 (-0.9397, 0.4315) -0.2424 
Gender identity (Male, Transvestite, Others) 

Transvestite 0.9105 (0.0288, 1.7532) 0.9104 
Others 0.8312 (-0.0796, 1.6815) 0.8315 
Age category (< 25, > 25) 

> 25 1.0733 (0.4334, 1.7449) 1.0733 
Race (White, Black/mulatto, Other) 
Black/mulatto 0.5918 (-0.0375, 1.2291) 0.5925 
Other -0.0630 (-2.5805, 1.9089) -0.0622 
City of residence (Campinas, Other) 

Other -0.9087 (-1.9181, -0.0244) -0.9099 
Brcizilian criterion for purchase power (A/B/C, D/E ) 



-0.2899 



A/B/C{richest} -0.2900 (-1.1786, 0.6633) 
Belongs to a gay NGO? (Yes, No) 

No 0.2521 (-0.7466, 1.3456) 
Received free condom? (Yes, No) 

No -0.8730 (-2.0934, 0.1731) 
Consider self as sex worker? (Yes, No) 

No -1.3785 (-2.2522, -0.4881) 
Any symptoms of STI in the past year? (Yes, No) 

No -1.1053 (-1.7312, -0.4846) -1.1059 
Regulair drug user? (Yes, No) 

No -0.2307 (-1.0048, 0.5917) 
UIAI in the last 2 months? (Yes, No) 

Yes 0.1832 (-0.4584, 0.8197) 



0.2522 



-0.8728 



-1.3794 



-0.2306 



0.1831 



-2.6634, 0.3762) 



-0.9402, 0.4311) 

0.0287, 1.7532) 
-0.0790, 1.6822) 

0.4334, 1.7450) 

-0.0367, 1.2300) 
-2.5790, 1.9121) 

-1.9196, -0.0257) 



-1.1785, 0.6635) 
0.7466, 1.3457) 
-2.0933, 0.1736) 
-2.2534, -0.4891) 
-1.7318, -0.4851) 
■1.0047, 0.5919) 
0.4585, 0.8197) 



Die 
Pd 



320.14 
14.11 



320.13 
14.13 



